DRAFT DRAFT DRAFT
We have decided to take a leap and try plot_ly and knit to html output for this milestone. The html page is hosted as Github Pages and is accessible in this address: ++TBD++ https://tin6150.github.io/phw251_group_z/milestone5_groupZ.html
This is intended as final end user report, code blocks have been omitted for brevity, but it is available at our github repo )
The California Department of Public Health Office of Health Equity (OHE) recently issued a new policy to create a public-private partnership to improve healthcare facilities in five rural counties across the state. Our team will evaluate and recommend which counties should receive development funding proposals based on equitable selection criteria created by OHE. Specifically, we will explore data to identify which rural counties have more non-homeowners, aging individuals, higher chronic mortality rates, and have received minimal funding from the Department of Health Care Access and Information.
Source
Years and/or dates of data
Description of cleaning and creating new variables
Analytic methods
++ please review that they make sense, and that i didn't omit anything and met the rubric requirements ++
We used 3 datasets for this project:
The first dataset is from the 2012 Census and contains demographics info for each of the 58 counties of California. It includes info such as population per square mile, median age, number of households who are renters vs owners, ethnicity, genres, etc. We calculated renter to owner ratio for each county. Then calculated average age and population density for the whole state and visually inspected the data to see how each county stack up. We ended up using the National Rural Development Partnership’s definition to determine if a given county’s population density is to be classified as rural, for which there were 11.
The second dataset is the mortality surveillance obtained from the CA
Open Data Portal. It contains a breakdown of total mortality for each
county by 15 disease areas. We used the CDC definition to filter for the
chronic diseases, for which 10 fit in this criteria. The data range from
2014 to 2020, but we were tasked to focus on the last 5 years, thus we
applied a filter with Year >= 2016. As tasked, we also
performed filters with Geography_Type == "Occurrence" and
used Strata == "Total Population" to avoid over counting.
Any missing data were replaced with 0. Once the data is cleaned, we
summed all the disease occurrences within each county. We joined this
with the demographics data to obtain a mortality rate of chronic
conditions over 5 years for each county.
The third dataset is the HCAI funding, also obtained from the CA Open Data Portal. It contains healthcare spending for each county in 4 stages of project progression, updated about every 2 weeks. We focused on the latest available data, which was Aug 11, 2022, and those with state of “In Closure”. Many rural counties showed up with $0 amount, and we went back to double check our selection code. It checked out, much fundings are in large populous counties such as those around the greater Los Angeles and San Francisco. It was not that rural counties had no funding, there were fundings for example in the “In Construction” phase, but we decided to focus on “In Closure” to help our improvement plan to drive new spending for rural counties with high and variable mortality rates.
After cleaning and filtering the 3 datasets above, we joined them by county name, whereby we can see which counties has high renters, high chronic mortality rates, and the funding they receive.
++TEMP NOTE++
I (Tin) changed filter from Year>2016 to Year>=2016 to have a proper 5 year bracket,
results not significatly different than Milestone 4
Table 1 shows CA counties that are rural (per National Rural Development Partnership’s definition), and have median age greater than the state-wide average.
We have pre-sorted them by decreasing renter to owner ratio. We observe that while none of these counties have rent:owner ratio higher than the state-wide average, they are still fairly high; and they have $0 in the latest HCAI funding that are in the Closure state.
As background reference, across all 58 CA counties, we found these statistics:
Where age, renter to owner ratio, or Chronic Mortality is higher than
the state-wide average, they are highlighted in blue.
Note that mortality rate is calculated based on the latest available
population data: 2012. Number of Chronic cases for each county is
actually the average number of yearly cases between 2016-2020.
Figure 1 displays a few of the characteristics of interest in more detail for rural counties. For each county, the first two subplots depict median age and rent:own ratio, respectively. The third subplot depicts the funding amount each county received on projects with a status of “in construction” as of August 2022. We wanted to explore this variable in addition to funding for projects in closure as it’s an important factor in deciding who needs funding most urgently.
(++Please add what conclusion should the reader leave from reading this table++)
Figure 2 is a bar graph of Mortality Rate for Chronic diseases (as defined by CDC) across the 11 rural counties in CA (as defined by National Rural Development Partnership)
The list for Chronic disease is selected according to CDC
definition.
We note that we don’t have disease data for Alpine or Sierra county.
This table shows the most common chronic disease in each of the rural counties, while also showing the number of people who have the illness in the year 2020. The counties of Alpine and Sierra did not have chronic disease data available. As we expand funding in the 5 select counties of focus, special emphasis should be placed on Heart Disease, as that’s the most common chronic illness causing high mortality. *HTD= Heart Disease, CAN= Cancer
| County | Chronic Disease | Number Reported |
|---|---|---|
| Alpine | Not Available | 0 |
| Colusa | HTD | 172 |
| Inyo | HTD | 226 |
| Lassen | HTD | 352 |
| Mariposa | HTD | 200 |
| Modoc | HTD | 68 |
| Mono | CAN | 22 |
| Plumas | HTD | 226 |
| Sierra | Not Available | 0 |
| Siskiyou | CAN | 756 |
| Trinity | CAN | 62 |
Maximum of 1,000 words
++ please improve on this ++
While no county perfectly fit all three attributes of rural, high rental rate, and high median age rate, our visualizations and analysis offer a holistic view of which counties best fit the selection criteria. Our first step narrowed down which counties are “rural” as defined by the National Rural Development Partnership, and had a median age higher than the state-wide average. These 8 counties are depicted in Table 1, which includes data on the other selection criteria. Here we see that none of these rural counties have HCAI funding for projects with an “In Closure” status for August 2022.
Figure 1 provides a visual display of demographics and funding in rural counties. Most notable is that the third subplot displays funding received for projects “In Construction” as of August 2022. While all of these had a zero dollar amount in projects with “In Closure” status, both Inyo and Siskiyou reported funding of “In Construction” projects over $4 million. Counties that reported zero dollar amounts in both categories include Alpine, Mariposa, Modoc, and Sierra. Sierra additonally has the highest median age. Across all subplots in figure 1, we find three rural counties of particular interest: Mariposa, Modoc, and Plumas.
Figure 2 ranks the 11 rural California counties by decreasing chronic mortality rate. The 5 counties with the highest mortality rate include Siskiyou, Inyo, Mariposa, Plumas and Modoc. These counties align with those found to have high median age and large renter to owner ratio in Table 1 above. Figure 2 also helps us narrow down our selection because, while Alpine and Sierra have not recently received funding, they have the lowest mortality rates. Although in Figure 1 Inyo and Siskiyou have received funding, their mortality rates are quite high, indicating that a higher need for funding.
Ultimately, our analysis identfied Inyo, Siskiyou, Mariposa, Modoc, and Plumas counties to receive HCAI Funding.
++ please expand, what the other visuals says, etc ++
#Aubreys original plot from milestone 4 – dont’t feel like it adds much info. # placing here just in case you all feel differently The following boxplot summarizes chronic disease mortality rates from all CA counties, grouped according to HCAI funding amounts for in closure projects as of August 2022. The funding amounts were categorized as “high” if they were above the mean amount, low if they were below the mean, and “no funding” if no funding for in closure projects was reported.
funding_chronic <- funding_data %>%
filter( `OSHPD Project Status` == "In Closure") %>%
filter( `Data Generation Date` == as_date( "2022-08-11")) %>%
mutate(funding_amount = case_when(
Numeric_Cost > 12239849 ~ "High Funding",
Numeric_Cost == 0 ~ "No Funding",
Numeric_Cost < 12239849 ~ "Low Funding"
)) %>%
inner_join(demographics_chronic, funding_data_all_counties, by = "County") %>%
select(pct, County, funding_amount, rural_class, Numeric_Cost)
plot_ly(
funding_chronic,
y=~pct,
color= ~funding_amount,
type="scatter"
) %>%
layout(
title="Chronic Disease Mortality Rates & HCAI Funding",
yaxis=list(title="Chronic Disease Rate"))